home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Language/OS - Multiplatform Resource Library
/
LANGUAGE OS.iso
/
pcl
/
docs.lha
/
cmu-user
/
cmu-user.info-6
< prev
next >
Wrap
Text File
|
1992-08-05
|
51KB
|
1,192 lines
Info file: cmu-user.info, -*-Text-*-
produced by latexinfo-format-buffer
from file: cmu-user.tex
File: cmu-user.info Node: Inline Expansion Recording, Prev: Inline Expansion, Up: Inline Expansion, Next: Semi-Inline Expansion
Inline Expansion Recording
--------------------------
Inline expansion requires that the source for the inline expanded
function to be available when calls to the function are compiled. The
compiler doesn't remember the inline expansion for every function, since
that would take an excessive about of space. Instead, the programmer
must tell the compiler to record the inline expansion before the
definition of the inline expanded function is compiled. This is done by
globally declaring the function inline before the function is defined,
by using the `inline' and `extensions:maybe-inline' (?) declarations.
In addition to recording the inline expansion of inline functions at the
time the function is compiled, `compile-file' also puts the inline
expansion in the output file. When the output file is loaded, the
inline expansion is made available for subsequent compilations; there is
no need to compile the definition again to record the inline expansion.
If a function is declared inline, but no expansion is recorded, then the
compiler will give an efficiency note like:
Note: MYFUN is declared inline, but has no expansion.
When you get this note, check that the `inline' declaration and the
definition appear before the calls that are to be inline expanded. This
note will also be given if the inline expansion for a `defun' could not
be recorded because the `defun' was in a non-null lexical environment.
File: cmu-user.info Node: Semi-Inline Expansion, Prev: Inline Expansion Recording, Up: Inline Expansion, Next: The Maybe-Inline Declaration
Semi-Inline Expansion
---------------------
Python supports SEMI-INLINE functions. Semi-inline expansion shares a
single copy of a function across all the calls in a component by
converting the inline expansion into a local function (*Note Local
Call::.) This takes up less space when there are multiple calls, but
also provides less opportunity for context dependent optimization. When
there is only one call, the result is identical to normal inline
expansion. Semi-inline expansion is done when the `space' optimization
quality is `0', and the function has been declared
`extensions:maybe-inline'.
This mechanism of inline expansion combined with local call also allows
recursive functions to be inline expanded. If a recursive function is
declared `inline', calls will actually be compiled semi-inline.
Although recursive functions are often so complex that there is little
advantage to semi-inline expansion, it can still be useful in the same
sort of cases where normal inline expansion is especially advantageous,
i.e. functions where the calling context can help a lot.
File: cmu-user.info Node: The Maybe-Inline Declaration, Prev: Semi-Inline Expansion, Up: Inline Expansion
The Maybe-Inline Declaration
----------------------------
The `extensions:maybe-inline' declaration is a CMU Common Lisp
extension. It is similar to `inline', but indicates that inline
expansion may sometimes be desirable, rather than saying that inline
expansion should almost always be done. When used in a global
declaration, `extensions:maybe-inline' causes the expansion for the
named functions to be recorded, but the functions aren't actually inline
expanded unless `space' is `0' or the function is eventually (perhaps
locally) declared `inline'.
Use of the `extensions:maybe-inline' declaration followed by the `defun' is
preferable to the standard idiom of:
(proclaim '(inline myfun))
(defun myfun () ...)
(proclaim '(notinline myfun))
;;; Any calls to `myfun' here are not inline expanded.
(defun somefun ()
(declare (inline myfun))
;;
;; Calls to `myfun' here are inline expanded.
...)
The problem with using `notinline' in this way is that in CMU Common
Lisp it does more than just suppress inline expansion, it also forbids
the compiler to use any knowledge of `myfun' until a later `inline'
declaration overrides the `notinline'. This prevents compiler warnings
about incorrect calls to the function, and also prevents block
compilation.
The `extensions:maybe-inline' declaration is used like this:
(proclaim '(extensions:maybe-inline myfun))
(defun myfun () ...)
;;; Any calls to `myfun' here are not inline expanded.
(defun somefun ()
(declare (inline myfun))
;;
;; Calls to `myfun' here are inline expanded.
...)
(defun someotherfun ()
(declare (optimize (space 0)))
;;
;; Calls to `myfun' here are expanded semi-inline.
...)
In this example, the use of `extensions:maybe-inline' causes the
expansion to be recorded when the `defun' for `somefun' is compiled, and
doesn't waste space through doing inline expansion by default. Unlike
`notinline', this declaration still allows the compiler to assume that
the known definition really is the one that will be called when giving
compiler warnings, and also allows the compiler to do semi-inline
expansion when the policy is appropriate.
When the goal is merely to control whether inline expansion is done by
default, it is preferable to use `extensions:maybe-inline' rather than
`notinline'. The `notinline' declaration should be reserved for those
special occasions when a function may be redefined at run-time, so the
compiler must be told that the obvious definition of a function is not
necessarily the one that will be in effect at the time of the call.
File: cmu-user.info Node: Object Representation, Prev: Inline Expansion, Up: Advanced Compiler Use and Efficiency Hints, Next: Numbers
Object Representation
=====================
A somewhat subtle aspect of writing efficient CMU Common Lisp programs
is choosing the correct data structures so that the underlying objects
can be implemented efficiently. This is partly because of the need for
multiple representations for a given value (?), but is also due to the
sheer number of object types that CMU Common Lisp has built in. The
number of possible representations complicates the choice of a good
representation because semantically similar objects may vary in their
efficiency depending on how the program operates on them.
* Menu:
* Think Before You Use a List::
* Structures::
* Arrays::
* Vectors::
* Bit-Vectors::
* Hashtables::
File: cmu-user.info Node: Think Before You Use a List, Prev: Object Representation, Up: Object Representation, Next: Structures
Think Before You Use a List
---------------------------
Although Lisp's creator seemed to think that it was for LISt Processing,
the astute observer may have noticed that the chapter on list
manipulation makes up less that three percent of Common Lisp: the
Language II. The language has grown since Lisp 1.5 -- new data types
supersede lists for many purposes.
File: cmu-user.info Node: Structures, Prev: Think Before You Use a List, Up: Object Representation, Next: Arrays
Structures
----------
One of the best ways of building complex data structures is to define
appropriate structure types using defstruct. In Python, access of
structure slots is always at least as fast as list or vector access, and
is usually faster. In comparison to a list representation of a tuple,
structures also have a space advantage.
Even if structures weren't more efficient than other representations, structure
use would still be attractive because programs that use structures in
appropriate ways are much more maintainable and robust than programs written
using only lists. For example:
(rplaca (caddr (cadddr x)) (caddr y))
could have been written using structures in this way:
(setf (beverage-flavor (astronaut-beverage x)) (beverage-flavor y))
The second version is more maintainable because it is easier to
understand what it is doing. It is more robust because structures
accesses are type checked. An `astronaut' will never be confused with a
`beverage', and the result of `beverage-flavor' is always a flavor. See
sections *Note Structure Types:: and *Note The Freeze-Type Declaration::
for more information about structure types. *Note Type Inference:: for
a number of examples that make clear the advantages of structure typing.
Note that the structure definition should be compiled before any uses of
its accessors or type predicate so that these function calls can be
efficiently open-coded.
File: cmu-user.info Node: Arrays, Prev: Structures, Up: Object Representation, Next: Vectors
Arrays
------
Arrays are often the most efficient representation for collections of objects
because:
* Array representations are often the most compact. An array is
always more compact than a list containing the same number of
elements.
* Arrays allow fast constant-time access.
* Arrays are easily destructively modified, which can reduce consing.
* Array element types can be specialized, which reduces both overall size and
consing (?.)
Access of arrays that are not of type `simple-array' is less efficient,
so declarations are appropriate when an array is of a simple type like
`simple-string' or `simple-bit-vector'. Arrays are almost always
simple, but the compiler may not be able to prove simpleness at every
use. The only way to get a non-simple array is to use the
:displaced-to, :fill-pointer or `adjustable' arguments to `make-array'.
If you don't use these hairy options, then arrays can always be declared
to be simple.
Because of the many specialized array types and the possibility of
non-simple arrays, array access is much like generic arithmetic (?). In
order for array accesses to be efficiently compiled, the element type
and simpleness of the array must be known at compile time. If there is
inadequate information, the compiler is forced to call a generic array
access routine. You can detect inefficient array accesses by enabling
efficiency notes, ?.
File: cmu-user.info Node: Vectors, Prev: Arrays, Up: Object Representation, Next: Bit-Vectors
Vectors
-------
Vectors (one dimensional arrays) are particularly useful, since in
addition to their obvious array-like applications, they are also well
suited to representing sequences. In comparison to a list
representation, vectors are faster to access and take up between two and
sixty-four times less space (depending on the element type.) As with
arbitrary arrays, the compiler needs to know that vectors are not
complex, so you should use `simple-string' in preference to `string',
etc.
The only advantage that lists have over vectors for representing
sequences is that it is easy to change the length of a list, add to it
and remove items from it. Likely signs of archaic, slow lisp code are
`nth' and `nthcdr'. If you are using these functions you should
probably be using a vector.
File: cmu-user.info Node: Bit-Vectors, Prev: Vectors, Up: Object Representation, Next: Hashtables
Bit-Vectors
-----------
Another thing that lists have been used for is set manipulation. In
applications where there is a known, reasonably small universe of items
bit-vectors can be used to improve performance. This is much less
convenient than using lists, because instead of symbols, each element in
the universe must be assigned a numeric index into the bit vector.
Using a bit-vector will nearly always be faster, and can be tremendously
faster if the number of elements in the set is not small. The logical
operations on `simple-bit-vector's are efficient, since they operate on
a word at a time.
File: cmu-user.info Node: Hashtables, Prev: Bit-Vectors, Up: Object Representation
Hashtables
----------
Hashtables are an efficient and general mechanism for maintaining
associations such as the association between an object and its name.
Although hashtables are usually the best way to maintain associations,
efficiency and style considerations sometimes favor the use of an
association list (a-list).
`assoc' is fairly fast when the TEST argument is `eq' or `eql' and there
are only a few elements, but the time goes up in proportion with the
number of elements. In contrast, the hash-table lookup has a somewhat
higher overhead, but the speed is largely unaffected by the number of
entries in the table. For an `equal' hash-table or alist, hash-tables
have an even greater advantage, since the test is more expensive.
Whatever you do, be sure to use the most restrictive test function
possible.
The style argument observes that although hash-tables and alists
overlap in function, they do not do all things equally well.
* Alists are good for maintaining scoped environments. They were
originally invented to implement scoping in the Lisp interpreter,
and are still used for this in Python. With an alist one can
non-destructively change an association simply by consing a new
element on the front. This is something that cannot be done with
hash-tables.
* Hashtables are good for maintaining a global association. The
value associated with an entry can easily be changed with `setf'.
With an alist, one has to go through contortions, either
`rplacd''ing the cons if the entry exists, or pushing a new one if
it doesn't. The side-effecting nature of hash-table operations is
an advantage here.
Historically, symbol property lists were often used for global name
associations. Property lists provide an awkward and error-prone
combination of name association and record structure. If you must use
the property list, please store all the related values in a single
structure under a single property, rather than using many properties.
This makes access more efficient, and also adds a modicum of typing and
abstraction. *Note More About Types in Python:: for information on
types in CMU Common Lisp.
File: cmu-user.info Node: Numbers, Prev: Object Representation, Up: Advanced Compiler Use and Efficiency Hints, Next: General Efficiency Hints
Numbers
=======
Numbers are interesting because numbers are one of the few Common Lisp
data types that have direct support in conventional hardware. If a
number can be represented in the way that the hardware expects it, then
there is a big efficiency advantage.
Using hardware representations is problematical in Common Lisp due to dynamic typing
(where the type of a value may be unknown at compile time.) It is possible to
compile code for statically typed portions of a Common Lisp program with efficiency
comparable to that obtained in statically typed languages such as C, but not
all Common Lisp implementations succeed. There are two main barriers to efficient
numerical code in Common Lisp:
* The compiler must prove that the numerical expression is in fact statically
typed, and
* The compiler must be able to somehow reconcile the conflicting
demands of the hardware mandated number representation with the
Common Lisp requirements of dynamic typing and garbage-collecting
dynamic storage allocation.
Because of its type inference (*Note Type Inference::) and efficiency
notes (?), Python is better than conventional Common Lisp compilers at
ensuring that numerical expressions are statically typed. Python also
goes somewhat farther than existing compilers in the area of allowing
native machine number representations in the presence of garbage
collection.
* Menu:
* Descriptors::
* Non-Descriptor Representations::
* Variables::
* Generic Arithmetic::
* Fixnums::
* Word Integers::
* Floating Point Efficiency::
* Specialized Arrays::
* Interactions With Local Call::
* Representation of Characters::
File: cmu-user.info Node: Descriptors, Prev: Numbers, Up: Numbers, Next: Non-Descriptor Representations
Descriptors
-----------
Common Lisp's dynamic typing requires that it be possible to represent any value
with a fixed length object, known as a DESCRIPTOR. This fixed-length
requirement is implicit in features such as:
* Data types (like `simple-vector') that can contain any type of object, and
that can be destructively modified to contain different objects (of possibly
different types.)
* Functions that can be called with any type of argument, and that
can be redefined at run time.
In order to save space, a descriptor is invariably represented as a
single word. Objects that can be directly represented in the descriptor
itself are said to be IMMEDIATE. Descriptors for objects larger than
one word are in reality pointers to the memory actually containing the
object.
Representing objects using pointers has two major disadvantages:
* The memory pointed to must be allocated on the heap, so it must
eventually be freed by the garbage collector. Excessive heap
allocation of objects (or "consing") is inefficient in several
ways. ?.
* Representing an object in memory requires the compiler to emit
additional instructions to read the actual value in from memory,
and then to write the value back after operating on it.
The introduction of garbage collection makes things even worse, since the
garbage collector must be able to determine whether a descriptor is an
immediate object or a pointer. This requires that a few bits in each
descriptor be dedicated to the garbage collector. The loss of a few bits
doesn't seem like much, but it has a major efficiency implication -- objects
whose natural machine representation is a full word (integers and
single-floats) cannot have an immediate representation. So the compiler is
forced to use an unnatural immediate representation (such as `fixnum') or a
natural pointer representation (with the attendant consing overhead.)
File: cmu-user.info Node: Non-Descriptor Representations, Prev: Descriptors, Up: Numbers, Next: Variables
Non-Descriptor Representations
------------------------------
From the discussion above, we can see that the standard descriptor
representation has many problems, the worst being number consing.
Common Lisp compilers try to avoid these descriptor efficiency problems by using
NON-DESCRIPTOR representations. A compiler that uses non-descriptor
representations can compile this function so that it does no number consing:
(defun multby (vec n)
(declare (type (simple-array single-float (*)) vec)
(single-float n))
(dotimes (i (length vec))
(setf (aref vec i)
(* n (aref vec i)))))
If a descriptor representation were used, each iteration of the loop
might cons two floats and do three times as many memory references.
As its negative definition suggests, the range of possible
non-descriptor representations is large. The performance improvement
from non-descriptor representation depends upon both the number of types
that have non-descriptor representations and the number of contexts in
which the compiler is forced to use a descriptor representation.
Many Common Lisp compilers support non-descriptor representations for float types
such as `single-float' and `double-float' (section ?.)
Python adds support for full word integers (?),
characters (?) and system-area pointers (unconstrained
pointers, ?.) Many Common Lisp compilers
support non-descriptor representations for variables (section
?) and array elements (section ?.)
Python adds support for non-descriptor arguments and return values in local
call (?).
File: cmu-user.info Node: Variables, Prev: Non-Descriptor Representations, Up: Numbers, Next: Generic Arithmetic
Variables
---------
In order to use a non-descriptor representation for a variable or expression
intermediate value, the compiler must be able to prove that the value is always
of a particular type having a non-descriptor representation. Type inference
(*Note Type Inference::) often needs some help from user-supplied
declarations. The best kind of type declaration is a variable type declaration
placed at the binding point:
(let ((x (car l)))
(declare (single-float x))
...)
Use of `the', or of variable declarations not at the binding form is
insufficient to allow non-descriptor representation of the variable -- with
these declarations it is not certain that all values of the variable are of the
right type. It is sometimes useful to introduce a gratuitous binding that
allows the compiler to change to a non-descriptor representation, like:
(etypecase x
((signed-byte 32)
(let ((x x))
(declare (type (signed-byte 32) x))
...))
...)
The declaration on the inner `x' is necessary here due to a phase
ordering problem. Although the compiler will eventually prove that the
outer `x' is a `(signed-byte 32)' within that `etypecase' branch, the
inner `x' would have been optimized away by that time. Declaring the
type makes let optimization more cautious.
Note that storing a value into a global (or `special') variable always
forces a descriptor representation. Wherever possible, you should
operate only on local variables, binding any referenced globals to local
variables at the beginning of the function, and doing any global
assignments at the end.
Efficiency notes signal use of inefficient representations, so programmer's
needn't continuously worry about the details of representation selection (?.)
File: cmu-user.info Node: Generic Arithmetic, Prev: Variables, Up: Numbers, Next: Fixnums
Generic Arithmetic
------------------
In CMU Common Lisp, arithmetic operations are GENERIC. (9) (*Note
Generic Arithmetic-Footnotes::) The `+' function can be passed
`fixnum's, `bignum's, `ratio's, and various kinds of `float's and
`complex'es, in any combination. In addition to the inherent complexity
of `bignum' and `ratio' operations, there is also a lot of overhead in
just figuring out which operation to do and what contagion and
canonicalization rules apply. The complexity of generic arithmetic is
so great that it is inconceivable to open code it. Instead, the
compiler does a function call to a generic arithmetic routine, consuming
many instructions before the actual computation even starts.
This is ridiculous, since even Common Lisp programs do a lot of arithmetic, and the
hardware is capable of doing operations on small integers and floats with a
single instruction. To get acceptable efficiency, the compiler special-cases
uses of generic arithmetic that are directly implemented in the hardware. In
order to open code arithmetic, several constraints must be met:
* All the arguments must be known to be a good type of number.
* The result must be known to be a good type of number.
* Any intermediate values such as the result of `(+ a b)' in the call
`(+ a b c)' must be known to be a good type of number.
* All the above numbers with good types must be of the SAME good
type. Don't try to mix integers and floats or different float
formats.
The "good types" are `(signed-byte 32)', `(unsigned-byte 32)',
`single-float' and `double-float'. See sections ?, ? and ? for more
discussion of good numeric types.
`float' is not a good type, since it might mean either `single-float' or
`double-float'. `integer' is not a good type, since it might mean
`bignum'. `rational' is not a good type, since it might mean `ratio'.
Note however that these types are still useful in declarations, since
type inference may be able to strengthen a weak declaration into a good
one, when it would be at a loss if there was no declaration at all
(*Note Type Inference::). The `integer' and `unsigned-byte' (or
non-negative integer) types are especially useful in this regard, since
they can often be strengthened to a good integer type.
Arithmetic with `complex' numbers is inefficient in comparison to float and
integer arithmetic. Complex numbers are always represented with a pointer
descriptor (causing consing overhead), and complex arithmetic is always closed
coded using the general generic arithmetic functions. But arithmetic with
complex types such as:
(complex float)
(complex fixnum)
is still faster than `bignum' or `ratio' arithmetic, since the
implementation is much simpler.
Note: don't use `/' to divide integers unless you want the overhead of
rational arithmetic. Use `truncate' even when you know that the
arguments divide evenly.
You don't need to remember all the rules for how to get open-coded
arithmetic, since efficiency notes will tell you when and where there is
a problem -- ?.
File: cmu-user.info Node: Generic Arithmetic-Footnotes, Up: Generic Arithmetic
(9) As Steele notes in CLTL II, this is a generic conception of generic,
and is not to be confused with the CLOS concept of a generic function.
File: cmu-user.info Node: Fixnums, Prev: Generic Arithmetic, Up: Numbers, Next: Word Integers
Fixnums
-------
A fixnum is a "FIXed precision NUMber". In modern Common Lisp
implementations, fixnums can be represented with an immediate
descriptor, so operating on fixnums requires no consing or memory
references. Clever choice of representations also allows some
arithmetic operations to be done on fixnums using hardware supported
word-integer instructions, somewhat reducing the speed penalty for using
an unnatural integer representation.
It is useful to distinguish the `fixnum' type from the fixnum
representation of integers. In Python, there is absolutely nothing
magical about the `fixnum' type in comparison to other finite integer
types. `fixnum' is equivalent to (is defined with `deftype' to be)
`(signed-byte 30)'. `fixnum' is simply the largest subset of integers
that can be represented using an immediate fixnum descriptor.
Unlike in other CMU Common Lisp compilers, it is in no way desirable to use the
`fixnum' type in declarations in preference to more restrictive integer types
such as `bit', `(integer -43 7)' and `(unsigned-byte 8)'. Since
Python does understand these integer types, it is preferable to use the more
restrictive type, as it allows better type inference (*Note Operation Specific Type Inference::.)
The small, efficient fixnum is contrasted with bignum, or "BIG NUMber".
This is another descriptor representation for integers, but this time a
pointer representation that allows for arbitrarily large integers.
Bignum operations are less efficient than fixnum operations, both
because of the consing and memory reference overheads of a pointer
descriptor, and also because of the inherent complexity of extended
precision arithmetic. While fixnum operations can often be done with a
single instruction, bignum operations are so complex that they are
always done using generic arithmetic.
A crucial point is that the compiler will use generic arithmetic if
it can't PROVE that all the arguments, intermediate values, and results are
fixnums. With bounded integer types such as `fixnum', the result type proves
to be especially problematical, since these types are not closed under
common arithmetic operations such as `+', `-', `*' and `/'. For
example, `(1+ (the fixnum x))' does not necessarily evaluate to a
`fixnum'. Bignums were added to Common Lisp to get around this problem, but they
really just transform the correctness problem "if this add overflows, you will
get the wrong answer" to the efficiency problem "if this add MIGHT overflow
then your program will run slowly (because of generic arithmetic.)"
There is just no getting around the fact that the hardware only directly
supports short integers. To get the most efficient open coding, the
compiler must be able to prove that the result is a good integer type.
This is an argument in favor of using more restrictive integer types:
`(1+ (the fixnum x))' may not always be a `fixnum', but `(1+ (the
(unsigned-byte 8) x))' always is. Of course, you can also assert the
result type by putting in lots of `the' declarations and then compiling
with `safety' `0'.
File: cmu-user.info Node: Word Integers, Prev: Fixnums, Up: Numbers, Next: Floating Point Efficiency
Word Integers
-------------
Python is unique in its efficient implementation of arithmetic
on full-word integers through non-descriptor representations and open coding.
Arithmetic on any subtype of these types:
(signed-byte 32)
(unsigned-byte 32)
is reasonably efficient, although subtypes of `fixnum' remain somewhat
more efficient.
If a word integer must be represented as a descriptor, then the `bignum'
representation is used, with its associated consing overhead. The
support for word integers in no way changes the language semantics, it
just makes arithmetic on small bignums vastly more efficient. It is
fine to do arithmetic operations with mixed `fixnum' and word integer
operands; just declare the most specific integer type you can, and let
the compiler decide what representation to use.
In fact, to most users, the greatest advantage of word integer
arithmetic is that it effectively provides a few guard bits on the
fixnum representation. If there are missing assertions on intermediate
values in a fixnum expression, the intermediate results can usually be
proved to fit in a word. After the whole expression is evaluated, there
will often be a fixnum assertion on the final result, allowing creation
of a fixnum result without even checking for overflow.
The remarks in section *Note Fixnums:: about fixnum result type also
apply to word integers; you must be careful to give the compiler enough
information to prove that the result is still a word integer. This
time, though, when we blow out of word integers we land in into generic
bignum arithmetic, which is much worse than sleazing from `fixnum's to
word integers. Note that mixing `(unsigned-byte 32)' arguments with
arguments of any signed type (such as `fixnum') is a no-no, since the
result might not be unsigned.
File: cmu-user.info Node: Floating Point Efficiency, Prev: Word Integers, Up: Numbers, Next: Specialized Arrays
Floating Point Efficiency
-------------------------
Arithmetic on objects of type `single-float' and `double-float' is
efficiently implemented using non-descriptor representations and open
coding. As for integer arithmetic, the arguments must be known to be of
the same float type. Unlike for integer arithmetic, the results and
intermediate values usually take care of themselves due to the rules of
float contagion, i.e. `(1+ (the single-float x))' is always a
`single-float'.
Although they are not specially implemented, `short-float' and
`long-float' are also acceptable in declarations, since they are
synonyms for the `single-float' and `double-float' types, respectively.
It is harmless to use list-style float type specifiers such as
`(single-float 0.0 1.0)', but Python currently makes little use of
bounds on float types.
When a float must be represented as a descriptor, a pointer
representation is used, creating consing overhead. For this reason, you
should try to avoid situations (such as full call and non-specialized
data structures) that force a descriptor representation. See sections ?
and ?.
*Note Floats:: for information on the extensions to support IEEE
floating point.
File: cmu-user.info Node: Specialized Arrays, Prev: Floating Point Efficiency, Up: Numbers, Next: Interactions With Local Call
Specialized Arrays
------------------
CMU Common Lisp supports specialized array element types through the :element-type
argument to `make-array'. When an array has a specialized element type, only
elements of that type can be stored in the array. From this restriction comes
two major efficiency advantages:
* A specialized array can save space by packing multiple elements
into a single word. For example, a `base-char' array can have 4
elements per word, and a `bit' array can have 32. This
space-efficient representation is possible because it is not
necessary to separately indicate the type of each element.
* The elements in a specialized array can be given the same
non-descriptor representation as the one used in registers and on
the stack, eliminating the need for representation conversions when
reading and writing array elements. For objects with pointer
descriptor representations (such as floats and word integers) there
is also a substantial consing reduction because it is not necessary
to allocate a new object every time an array element is modified.
These are the specialized element types currently supported:
bit
(unsigned-byte 2)
(unsigned-byte 4)
(unsigned-byte 8)
(unsigned-byte 16)
(unsigned-byte 32)
base-character
single-float
double-float
Although a `simple-vector' can hold any type of object, true should
still be considered a specialized array type, since arrays with element
type true are specialized to hold descriptors.
When using non-descriptor representations, it is particularly important
to make sure that array accesses are open-coded, since in addition to
the generic operation overhead, efficiency is lost when the array
element is converted to a descriptor so that it can be passed to (or
from) the generic access routine. You can detect inefficient array
accesses by enabling efficiency notes, ?. *Note Arrays::.
File: cmu-user.info Node: Interactions With Local Call, Prev: Specialized Arrays, Up: Numbers, Next: Representation of Characters
Interactions With Local Call
----------------------------
Local call has many advantages (*Note Local Call::); one relevant to
our discussion here is that local call extends the usefulness of non-descriptor
representations. If the compiler knows from the argument type that an argument
has a non-descriptor representation, then the argument will be passed in that
representation. The easiest way to ensure that the argument type is known at
compile time is to always declare the argument type in the called function,
like:
(defun 2+f (x)
(declare (single-float x))
(+ x 2.0))
The advantages of passing arguments and return values in a
non-descriptor representation are the same as for non-descriptor
representations in general: reduced consing and memory access (*Note
Non-Descriptor Representations::.) This extends the applicative
programming styles discussed in section *Note Local Call:: to numeric
code. Also, if source files are kept reasonably small, block
compilation can be used to reduce number consing to a minimum.
Note that non-descriptor return values can only be used with the known return
convention (section *Note Return Values::.) If the compiler can't prove that
a function always returns the same number of values, then it must use the
unknown values return convention, which requires a descriptor representation.
Pay attention to the known return efficiency notes to avoid number consing.
File: cmu-user.info Node: Representation of Characters, Prev: Interactions With Local Call, Up: Numbers
Representation of Characters
----------------------------
Python also uses a non-descriptor representation for characters when
convenient. This improves the efficiency of string manipulation, but is
otherwise pretty invisible; characters have an immediate descriptor
representation, so there is not a great penalty for converting a
character to a descriptor. Nonetheless, it may sometimes be helpful to
declare character-valued variables as `base-character'.
File: cmu-user.info Node: General Efficiency Hints, Prev: Numbers, Up: Advanced Compiler Use and Efficiency Hints, Next: Efficiency Notes
General Efficiency Hints
========================
This section is a summary of various implementation costs and ways to
get around them. These hints are relatively unrelated to the use of the
Python compiler, and probably also apply to most other Common Lisp
implementations. In each section, there are references to related
in-depth discussion.
* Menu:
* Compile Your Code::
* Avoid Unnecessary Consing::
* Complex Argument Syntax::
* Mapping and Iteration::
* Trace Files and Disassembly::
File: cmu-user.info Node: Compile Your Code, Prev: General Efficiency Hints, Up: General Efficiency Hints, Next: Avoid Unnecessary Consing
Compile Your Code
-----------------
At this point, the advantages of compiling code relative to running it
interpreted probably need not be emphasized too much, but remember that
in CMU Common Lisp, compiled code typically runs hundreds of times
faster than interpreted code. Also, compiled (`fasl') files load
significantly faster than source files, so it is worthwhile compiling
files which are loaded many times, even if the speed of the functions in
the file is unimportant.
Even disregarding the efficiency advantages, compiled code is as good or
better than interpreted code. Compiled code can be debugged at the
source level (see chapter *Note The Debugger::), and compiled code does
more error checking. For these reasons, the interpreter should be
regarded mainly as an interactive command interpreter, rather than as a
programming language implementation.
Do not be concerned about the performance of your program until you see
its speed compiled. Some techniques that make compiled code run faster
make interpreted code run slower.
File: cmu-user.info Node: Avoid Unnecessary Consing, Prev: Compile Your Code, Up: General Efficiency Hints, Next: Complex Argument Syntax
Avoid Unnecessary Consing
-------------------------
Consing is another name for allocation of storage, as done by
the `cons' function (hence its name.) `cons' is by no means the only
function which conses -- so does `make-array' and many other functions.
Arithmetic and function call can also have hidden consing overheads. Consing
hurts performance in the following ways:
* Consing reduces memory access locality, increasing paging activity.
* Consing takes time just like anything else.
* Any space allocated eventually needs to be reclaimed, either by
garbage collection or by starting a new `lisp' process.
Consing is not undiluted evil, since programs do things other than
consing, and appropriate consing can speed up the real work. It would
certainly save time to allocate a vector of intermediate results that
are reused hundreds of times. Also, if it is necessary to copy a large
data structure many times, it may be more efficient to update the data
structure non-destructively; this somewhat increases update overhead,
but makes copying trivial.
Note that the remarks in section *Note Writing Efficient Code:: about
the importance of separating tuning from coding also apply to consing
overhead. The majority of consing will be done by a small portion of
the program. The consing hot spots are even less predictable than the
CPU hot spots, so don't waste time and create bugs by doing unnecessary
consing optimization. During initial coding, avoid unnecessary
side-effects and cons where it is convenient. If profiling reveals a
consing problem, THEN go back and fix the hot spots.
*Note Non-Descriptor Representations:: for a discussion of how to avoid
number consing in Python.
File: cmu-user.info Node: Complex Argument Syntax, Prev: Avoid Unnecessary Consing, Up: General Efficiency Hints, Next: Mapping and Iteration
Complex Argument Syntax
-----------------------
Common Lisp has very powerful argument passing mechanisms. Unfortunately, two
of the most powerful mechanisms, rest arguments and keyword arguments, have a
significant performance penalty:
* With keyword arguments, the called function has to parse the
supplied keywords by iterating over them and checking them against
the desired keywords.
* With rest arguments, the function must cons a list to hold the
arguments. If a function is called many times or with many
arguments, large amounts of memory will be allocated.
Although rest argument consing is worse than keyword parsing, neither
problem is serious unless thousands of calls are made to such a
function. The use of keyword arguments is strongly encouraged in
functions with many arguments or with interfaces that are likely to be
extended, and rest arguments are often natural in user interface
functions.
Optional arguments have some efficiency advantage over keyword
arguments, but their syntactic clumsiness and lack of extensibility has
caused many CMU Common Lisp programmers to abandon use of optionals
except in functions that have obviously simple and immutable interfaces
(such as `subseq'), or in functions that are only called in a few
places. When defining an interface function to be used by other
programmers or users, use of only required and keyword arguments is
recommended.
Parsing of `defmacro' keyword and rest arguments is done at compile
time, so a macro can be used to provide a convenient syntax with an
efficient implementation. If the macro-expanded form contains no
keyword or rest arguments, then it is perfectly acceptable in inner
loops.
Keyword argument parsing overhead can also be avoided by use of inline
expansion (*Note Inline Expansion::) and block compilation (section
*Note Block Compilation::.)
Note: the compiler open-codes most heavily used system functions which
have keyword or rest arguments, so that no run-time overhead is
involved.
File: cmu-user.info Node: Mapping and Iteration, Prev: Complex Argument Syntax, Up: General Efficiency Hints, Next: Trace Files and Disassembly
Mapping and Iteration
---------------------
One of the traditional Common Lisp programming styles is a highly applicative one,
involving the use of mapping functions and many lists to store intermediate
results. To compute the sum of the square-roots of a list of numbers, one
might say:
(apply #'+ (mapcar #'sqrt list-of-numbers))
This programming style is clear and elegant, but unfortunately results
in slow code. There are two reasons why:
* The creation of lists of intermediate results causes much consing
(see *Note Avoid Unnecessary Consing::).
* Each level of application requires another scan down the list.
Thus, disregarding other effects, the above code would probably
take twice as long as a straightforward iterative version.
An example of an iterative version of the same code:
(do ((num list-of-numbers (cdr num))
(sum 0 (+ (sqrt (car num)) sum)))
((null num) sum))
See sections *Note Variable Type Inference:: and *Note Let
Optimization:: for a discussion of the interactions of iteration
constructs with type inference and variable optimization. Also, section
*Note Local Tail Recursion:: discusses an applicative style of
iteration.
File: cmu-user.info Node: Trace Files and Disassembly, Prev: Mapping and Iteration, Up: General Efficiency Hints
Trace Files and Disassembly
---------------------------
In order to write efficient code, you need to know the relative costs of
different operations. The main reason why writing efficient Common Lisp
code is difficult is that there are so many operations, and the costs of
these operations vary in obscure context-dependent ways. Although
efficiency notes point out some problem areas, the only way to ensure
generation of the best code is to look at the assembly code output.
The `disassemble' function is a convenient way to get the assembly code
for a function, but it can be very difficult to interpret, since the
correspondence with the original source code is weak. A better (but
more awkward) option is to use the :trace-file argument to
`compile-file' to generate a trace file.
A trace file is a dump of the compiler's internal representations, including
annotated assembly code. Each component in the program gets three pages in
the trace file (separated by "`^L'"):
* The implicit-continuation (or IR1) representation of the optimized
source. This is a dump of the flow graph representation used for
"source level" optimizations. As you will quickly notice, it is
not really very close to the source. This representation is not
very useful to even sophisticated users.
* The Virtual Machine (VM, or IR2) representation of the program.
This dump represents the generated code as sequences of "Virtual
OPerations" (VOPs.) This representation is intermediate between
the source and the assembly code -- each VOP corresponds fairly
directly to some primitive function or construct, but a given VOP
also has a fairly predictable instruction sequence. An operation
(such as `+') may have multiple implementations with different cost
and applicability. The choice of a particular VOP such as
`+/fixnum' or `+/single-float' represents this choice of
implementation. Once you are familiar with it, the VM
representation is probably the most useful for determining what
implementation has been used.
* An assembly listing, annotated with the VOP responsible for
generating the instructions. This listing is useful for figuring
out what a VOP does and how it is implemented in a particular
context, but its large size makes it more difficult to read.
Note that trace file generation takes much space and time, since the
trace file is tens of times larger than the source file. To avoid huge
confusing trace files and much wasted time, it is best to separate the
critical program portion into its own file and then generate the trace
file from this small file.
File: cmu-user.info Node: Efficiency Notes, Prev: General Efficiency Hints, Up: Advanced Compiler Use and Efficiency Hints, Next: Profiling
Efficiency Notes
================
Efficiency notes are messages that warn the user that the compiler has
chosen a relatively inefficient implementation for some operation.
Usually an efficiency note reflects the compiler's desire for more type
information. If the type of the values concerned is known to the
programmer, then additional declarations can be used to get a more
efficient implementation.
Efficiency notes are controlled by the `extensions:inhibit-warnings'
optimization quality (*Note The Optimize Declaration::.) When `speed'
is greater than `extensions:inhibit-warnings', efficiency notes are
enabled. Note that this implicitly enables efficiency notes whenever
`speed' is increased from its default of `1'.
Consider this program with an obscure missing declaration:
(defun eff-note (x y z)
(declare (fixnum x y z))
(the fixnum (+ x y z)))
If compiled with `(speed 3) (safety 0)', this note is given:
In: DEFUN EFF-NOTE
(+ X Y Z)
==>
(+ (+ X Y) Z)
Note: Forced to do inline (signed-byte 32) arithmetic (cost 3).
Unable to do inline fixnum arithmetic (cost 2) because:
The first argument is a (INTEGER -1073741824 1073741822),
not a FIXNUM.
This efficiency note tells us that the result of the intermediate computation
`(+ x y)' is not known to be a `fixnum', so the addition of the
intermediate sum to `z' must be done less efficiently. This can be fixed by
changing the definition of `eff-note':
(defun eff-note (x y z)
(declare (fixnum x y z))
(the fixnum (+ (the fixnum (+ x y)) z)))
* Menu:
* Type Uncertainty::
* Efficiency Notes and Type Checking::
* Representation Efficiency Notes::
* Verbosity Control::
File: cmu-user.info Node: Type Uncertainty, Prev: Efficiency Notes, Up: Efficiency Notes, Next: Efficiency Notes and Type Checking
Type Uncertainty
----------------
The main cause of inefficiency is the compiler's lack of adequate
information about the types of function argument and result values.
Many important operations (such as arithmetic) have an inefficient
general (generic) case, but have efficient implementations that can
usually be used if there is sufficient argument type information.
Type efficiency notes are given when a value's type is uncertain. There is an
important distinction between values that are not known to be of a good
type (uncertain) and values that are known not to be of a good type.
Efficiency notes are given mainly for the first case (uncertain types.) If it
is clear to the compiler that that there is not an efficient implementation for
a particular function call, then an efficiency note will only be given if the
`extensions:inhibit-warnings' optimization quality is `0' (*Note The Optimize Declaration::.)
In other words, the default efficiency notes only suggest that you add
declarations, not that you change the semantics of your program so that an
efficient implementation will apply. For example, compilation of this form
will not give an efficiency note:
(elt (the list l) i)
even though a vector access is more efficient than indexing a list.
File: cmu-user.info Node: Efficiency Notes and Type Checking, Prev: Type Uncertainty, Up: Efficiency Notes, Next: Representation Efficiency Notes
Efficiency Notes and Type Checking
----------------------------------
It is important that the `eff-note' example above used
`(safety 0)'. When type checking is enabled, you may get apparently
spurious efficiency notes. With `(safety 1)', the note has this extra
line on the end:
The result is a (INTEGER -1610612736 1610612733), not a FIXNUM.
This seems strange, since there is a `the' declaration on the result of
that second addition.
In fact, the inefficiency is real, and is a consequence of Python's
treating declarations as assertions to be verified. The compiler can't
assume that the result type declaration is true -- it must generate the
result and then test whether it is of the appropriate type.
In practice, this means that when you are tuning a program to run without type
checks, you should work from the efficiency notes generated by unsafe
compilation. If you want code to run efficiently with type checking, then you
should pay attention to all the efficiency notes that you get during safe
compilation. Since user supplied output type assertions (e.g., from `the')
are disregarded when selecting operation implementations for safe code, you
must somehow give the compiler information that allows it to prove that the
result truly must be of a good type. In our example, it could be done by
constraining the argument types more:
(defun eff-note (x y z)
(declare (type (unsigned-byte 18) x y z))
(+ x y z))
Of course, this declaration is acceptable only if the arguments to
`eff-note' always ARE `(unsigned-byte 18)' integers.